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A relatively small cadre of lineage-restricted transcription factors largely 
orchestrates erythropoiesis, but how these nuclear factors interact to regu- 
late this complex biology is still largely unknown. However, recent techno- 
logical advances, such as chromatin immunoprecipitation (ChIP) paired with 
massively parallel sequencing (ChlP-seq), gene expression profiling, and 
comprehensive bioinformatic analyses, offer new insights into the intricacies 
of red cell molecular circuits. 
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Red blood cells (RBCs) fulfill the 
essential functions of transporting 
oxygen to tissues and facilitating gas 
exchange in the lungs. They are con- 
tinuously produced throughout life in 
a tightly controlled growth process 
termed erythropoiesis. Erythroid dif- 
ferentiation is accompanied by tempo- 
rally regulated changes in cell surface 
protein expression, a reduction in cell 
size, progressive hemoglobinization, 
and nuclear condensation, which cul- 
minates in extrusion of the nucleus, 
RNA, and mitochondria (Richmond 
et al, 2005). 

Erythropoiesis is largely mediated 
by a relatively small number of lineage- 
restricted transcription factors, including 
GATA-1, SCL/TAL1, LM02, LDB1, 
and KLF1 (Cantor and Orkin, 2002). 
The importance of these transcription 
factors in erythropoiesis has been dem- 
onstrated unequivocally by cell-based 
ex vivo assays, as well as in knockout 
mouse models and rare patients with 
anemias. The critical transcription fac- 
tors are present in diverse multiprotein 
complexes. However, how distinct 
multiprotein complexes activate or re- 
press transcription, and thereby regu- 
late the erythroid maturation program, 
remains incompletely understood. New 
techniques, including ChIP coupled 
with massively parallel sequencing 
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(ChlP-seq), gene expression profiling, 
and bioinformatic analyses, provide 
new information about the regulatory 
networks that coordinate erythroid cell 
maturation and function. This mini- 
review will summarize recent findings 
relevant to the understanding of gene 
expression regulation in red blood cells. 



GATA-1 

The transcription factor GATA-1 rec- 
ognizes the DNA consensus sequence 
(A/T)GATA(A/G) through two Cys- 
X 2 -Cys-X 17 -Cys-X 2 -Cys zinc fingers 
that are characteristic of the GATA 
family (Wall et al, 1988; Evans and 
Felsenfeld, 1989). Annotation of GATA 
consensus sites, even those that are 
phylogenetically conserved, is a poor 
predictor of in vivo GATA-1 chromatin 
binding (Bresnick et al., 2005). Hence, 
several groups generated whole-genome 
occupancy maps for GATA-1 by using 
ChlP-seq in erythroid cell lines (Cheng 
et al, 2009; Fujiwara et al, 2009; Yu 
et al., 2009; Soler et al., 2010). Although 
three studies identified ^4,000-6,000 
in vivo binding sites for GATA-1 in 
mouse erythroleukemia (MEL) cells ex- 
pressing a tagged form of GATA-1 (Yu 
et al., 2009; Soler et al, 2010) or human 
K562 erythroleukemia cells (Fujiwara 
et al., 2009), a fourth study identified 
> 15,000 sites occupied by GATA-1 in 
G1E-ER4 cells, which were derived from 
GATA-1 knockout mouse embryonic 
stem cells and express an estrogen- 
inducible GATA-1 construct. Careful 
assessment of the data may help explain 
discrepancies in the number of GATA-1— 



occupied sites. These may have arisen 
from usage of different cell lines, em- 
ployment of different peak calling algo- 
rithms, differences in the ChIP protocols, 
or simply differences in choice of statis- 
tical cut offs. 

All studies demonstrated that a mi- 
nority of GATA-1 binding sites (^10— 
1 5%) are located at proximal promoter 
regions close to the transcription start 
site (TSS). The bulk of GATA-1 bind- 
ing (^85%) occurs at distal regulatory 
elements with equal distribution be- 
tween intra- and intergenic regions 
(Fujiwara et al, 2009; Yu et al, 2009). 
High-level H3K4 monomethlyation 
(H3K4mel), a histone mark strongly 
enriched at functional enhancer re- 
gions (Heintzman et al., 2007), was ob- 
served at nearly all GATA-1— occupied 
DNA segments, further supporting the 
notion that GATA-1 principally binds 
enhancer regions (Cheng et al., 2009). 
To identify direct GATA-1 target genes, 
microarray gene expression profiling 
was performed (Yu et al., 2009) using 
G1E-ER4 cells (Weiss et al, 1997). G1E 
cells are arrested at the proerythroblast 
stage of differentiation, but undergo 
synchronous terminal maturation upon 
restoration of GATA-1 function (Weiss 
et al, 1997). Reexpression of GATA-1 
triggers an extensive program of gene 
activation and repression (Weiss et al., 
1997). Superimposition of GATA-1 
whole-genome occupancy and gene 
expression data permitted identification 
of putative, direct GATA-1 targets. Al- 
though up to 5,000 genes were found to 
be differentially expressed upon GATA-1 
activation (Cheng et al., 2009; Fujiwara 
et al., 2009; Yu et al., 2009), a sur- 
prisingly small fraction (^300—700) 
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Figure 1. Model of the multiprotein complexes orchestrating gene expression or repression 
in erythroid cells. Comparison of GATA-1, SCL/TAL1, and LDB1 whole-genome occupancy maps 
with gene expression profiling data suggests that the GATA-1/SCL/TAL1-LM02-LDB1-E2A penta- 
meric complex, as well as a GATA-1-independent SCL/TAL-1 -containing complex, largely activate gene 
expression. GATA-1 may also activate gene expression in coordination with KLF1 (activating com- 
plexes, green box). GATA-1 might repress gene expression via a multi-step process. Interaction with 
the transcriptional repressor GFI-1 B recruits the LSDl/coREST complex, which results in removal of 
the activating H3K4me2 mark. To permanently silence gene expression, GATA-1 can recruit the PRC2 
complex (EED, Ezh2, and Suz12) resulting in H3K27 trimethylation and gene repression. The SCL/TAL1 
complex can recruit the corepressors ET02 and Mtgrl resulting in SCL/TAL1 mediated gene silencing 
(repressing complexes, red box). 



of genes could be identified as direct 
GATA-1 target genes (Fujiwara et al., 
2009; Yu et al., 2009). It should also 
be noted that within those genes identi- 
fied as direct GATA-1 targets, 40-57% 
were up-regulated and 41—60% were 
down-regulated (Cheng et al., 2009; 
Fujiwara et al, 2009; Yu et al, 2009), 
demonstrating that GATA-1 activates or 
represses nearly equivalent numbers of 
genes. Bioinformatic analysis of transcrip- 
tion factor motifs further revealed that 
among activated genes, binding sites 
for SCL/TAL1 were highly enriched 
(Cheng et al., 2009; Fujiwara et al., 
2009; Tripic et al, 2009; Yu et al, 
2009; Kassouf et al., 2010). Based on 
this finding, one may infer that GATA-1 
activates gene expression specifically in 
concert with SCL/TAL1 (Fig. 1). How- 
ever, partners for GATA-1 in gene repres- 
sion are less clear. GATA-1 is thought to 
facilitate gene repression via interaction 
with the NuRD complex; this may be 
mediated through a direct interaction be- 
tween GATA-1 and FOG-1 (Hong et al., 
2005; Rodriguez et al, 2005), as weU 
as via the transcriptional repressor Gfi-lb 
in concert with the LSD1— CoREST co- 
repressor complex (Fig. 1; Rodriguez 
et al, 2005; Saleque et al, 2007). Interest- 
ingly, the genome-wide occupancy maps 
revealed an additional level of complexity, 
as a subset of GATA-1— repressed genes 
was also found to carry the repressive 
H3K27me3 histone mark (Cheng et al., 
2009; Yu et al., 2009). This mark is cat- 
alyzed by the polycomb repressive com- 
plex 2 (PRC2), a multiprotein complex 
containing EED, Ezhl/2, and Suzl2 
(Miiller et al., 2002; Schuettengruber 
et al., 2007). Erythroid differentiation 
is impaired in mice with erythroid- 
specific loss of EED (Yu et al., 2009). 
Thus, the PRC2 complex participates 
in GATA-1— mediated gene repres- 
sion during erythroid differentiation. 
Whether GATA-1 recruits PRC2 di- 
rectly, or indirectly, will be of interest 
in future studies. 

It should be recognized that these 
chromatin occupancy studies do not ac- 
count for posttranslational modifications 
of GATA-1. For example, GATA-1 is 
acetylated (Boyes et al., 1998), and this 
modification appears to be important 



for erythroid differentiation (Hung et al., 
1999; Lamomca et al, 2006). A study 
recently published in JEM revealed the 
importance of GATA-1 SUMOylation 
(Yu et al., 2010b). Genetic ablation of 
the SUMO-specific protease SENP1 



resulted in severe anemia and embry- 
onic lethality in mice at embryonic day 
13.5. Accumulation of a SUMOylated 
form of GATA-1 was observed and 
coincided with down-regulation of 
GATA-1 target genes (Yu et al, 2010b). 
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SUMOylation may modulate aspects of 
GATA-1 function beyond DNA bind- 
ing, as suggested by Yu et al. (2010b), 
given that SUMOylation of FOG1 
affects its interaction with other pro- 
teins (Snow et al., 2010). Further work 
is needed to interrogate protein— protein 
and protein— DNA interactions of 
SUMOylated GATA-1. 

In recent years, microRNAs 
(miRNAs) have emerged as additional 
regulators of overall gene expression, 
representing yet another layer of con- 
trol. Indeed, recent work demonstrates 
that the miR- 144/451 locus is a direct 
target of GATA-1 and that mice lack- 
ing miR-144/451 or miR-451 alone 
show impaired erythropoiesis, particu- 
larly under conditions of stress (Dore 
et al, 2008; Rasmussen et al. 2010; 
Patrick et al, 2010; Yu et al., 2010a). 

SCL/TAL1 -LM02-LDB1 -E2A complex 

The basic helix— loop— helix (bHLH) 
transcription factor SCL/TAL1 recog- 
nizes a short consensus DNA motif 
(CANNTG), the E-box. SCL/TAL1 
expression largely parallels that of 
GATA-1, as it is expressed in erythroid 
cells, megakaryocytes, and mast cells 
(Cantor and Orkin, 2002). In erythroid 
cells, SCL/TAL1 forms a complex with 
the ubiquitous bHLH protein E2A, 
and also with the LIM domain con- 
taining cofactors LM02 and LDB1 
(Cantor and Orkin, 2002). These pro- 
teins interact with GATA-1 to form a 
pentameric complex (Fig. 1) that binds 
to composite E-box/GATA-1 DNA 
motifs spaced 9—11 nt apart (Wadman 
et al., 1997; Cohen-Kaminsky et al., 
1998). LM02, GATA-1, and SCL/TAL1 
are all required for erythropoiesis in 
mice (Cantor and Orkin, 2002), and 
a conditional knockout mouse model 
of SCL/TAL1 is available (Mikkola 
et al., 2003). In this issue, Li et al. pres- 
ent the first conditional knockout of 
LDB. They find that embryos lacking 
LDB1 show defective primitive eryth- 
ropoiesis and that Mx-Cre— driven de- 
letion of LDB1 in adult mice results in 
a persistent drop in hematocrit and, 
ultimately, death, demonstrating that 
LDB1 is continuously required for de- 
finitive erythropoiesis. 



For some time, only a handful of 
red cell— specific direct target genes of 
this complex had been identified. Two 
recent studies mapped whole-genome 
occupancy of this complex by perform- 
ing ChlP-seq for endogenous SCL/ 
TALI in primary mouse proerythro- 
blasts (Kassouf et al., 2010) or for tagged 
LDB1 and SCL/TAL1 in MEL cells 
(Soler et al., 2010). A third group gen- 
erated an occupancy map of SCL/TAL1 
in G1E-ER4 cells, performing ChlP- 
on-chip analysis using a tiling array 
covering mouse chromosome 7 (Tripic 
et al., 2009). Approximately 3,000- 
4,000 and 5,000 genome-wide binding 
sites were identified for SCL/TAL1 
and LDB1, respectively. Approximately 
30% of all SCL/TAL1 binding sites 
were located at proximal promoter re- 
gions (in this study defined as +3 kb of 
the TSS), whereas the bulk of SCL/ 
TALI binding (~70%) resided at distal 
regulatory elements with a distribution 
of 40 or 25% in intragenic or intergenic 
regions, respectively (Kassouf et al., 
2010). To identify putative direct SCL/ 
TALI target genes, microarray gene 
expression profiling was used to com- 
pare wild-type primary proerythroblasts 
with proerythroblasts derived from 
mice carrying a mutation in the DNA- 
binding domain of SCL/TAL1 (SCL/ 
tal1 rer. Kassouf et d ) 2008). 511 

differentially expressed genes were 
identified, with 51% up-regulated and 
49% down-regulated. The intersection 
of SCL/TAL1 occupancy and gene 
expression data resulted in an overlap 
of only 83 genes, which may be con- 
sidered direct SCL/TAL1 targets. 
Strikingly, 75% of these genes were 
down-regulated as compared with 
wild-type cells, indicating that SCL/ 
TALI largely activates gene expression. 
Analysis of motifs revealed enrichment 
of GATA binding sites close to SCL/ 
TALI binding sites at genes activated 
by SCL/TAL1, in accordance with the 
reciprocal findings for GATA-1 (see 
above). Gene repression mediated by 
the SCL/TAL1 complex may be per- 
formed via recruitment of the corepres- 
sors ET02 and Mtgrl (Fig. 1; Fujiwara 
et al, 2009; Tripic et al, 2009; Soler 
et al., 2010). This conclusion is supported 



by cooccupancy of ET02/Mtgrl at a 
subset of SCL/TAL1 target genes (Soler 
et al., 2010), as well as de-repression of 
some SCL/TAL1 target genes upon 
depletion of ET02 in erythroid cells 
(Tripic et al., 2009). The observation 
that SCL/TAL1 and LDB1 have been 
found binding far from their closest re- 
pressed gene prompted Soler et al. 
(2010) to perform chromosome con- 
formation capture sequencing (3C-seq). 
Combination of the LDB1 ChlP-seq 
and 3C-seq data revealed direct binding 
of LDB 1 to DNase-hypersensitive sites 
HS2, HS3, and HS4 of the |3-globin 
locus control region (LCR) and long- 
range interactions with the |3-globin 
promoter, despite the absence of a func- 
tional LDB1 binding site at the (3-globin 
promoter (Soler et al., 2010). It would 
be of interest to study the nature of these 
long-range interactions in the absence 
ofSCL/TALl or LDB 1. 

KLF1 

KLF1 (formerly called EKLF), a zinc 
finger transcription factor with three 
highly similar C-terminal C2H2-type 
Kruppel zinc fingers, recognizes a sub- 
set of CACC box motifs (Miller and 
Bieker, 1993). Expression of KLF1 is 
remarkably restricted to erythroid cells 
and their precursors (Miller and Bieker, 
1993). Although its essential role in eryth- 
ropoiesis has been known for quite some 
time (Cantor and Orkin, 2002), few 
direct transcriptional targets have been 
identified. 

Tallack et al. (2010) generated 
a whole-genome occupancy map for 
KLF1 in primary erythroid cells. In two 
independent ChlP-seq runs, KLF1 oc- 
cupied between 940 and 1,400 binding 
sites in erythroid cells. 16% of these 
binding events occurred within 1 kb of 
the TSS, whereas the majority of sites 
were located at distances of >10 kb 
away from TSSs (Tallack et al., 2010). 
To identify new direct KLF1 target 
genes, the authors compared ChlP-seq 
data with gene expression profiles of 
wild-type and Klfl 1 fetal liver cells 
(Hodge et al, 2006). A total of 1,099 
genes were differentially expressed in 
the absence of KLF1 in erythroid cells; 
730 genes were down-regulated and 
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369 genes were up-regulated (Hodge 
et al, 2006). Only ~19% of these genes 
were occupied by KLF1. The bulk of 
binding events occurred at genes that 
were down-regulated in the absence of 
KLF1, suggesting that KLF1 acts pri- 
marily as a transcriptional activator 
(Tallack et al., 2010). The authors inves- 
tigated a potential functional DNA- 
dependent interaction between KLF-1 
and GATA-1. Comparing KLF1 ChlP- 
seq data with results from GATA-1 whole 
genome occupancy maps, Tallack et al. 
determined the distances between the 
nearest GATA-1 peak and all KLF1 
peaks. Approximately 48% of KLF1 
peaks are located within 1 kb of GATA-1 
peaks, strongly supporting an in vivo 
cooperation of the two factors (Fig. 1). 
Finally, the authors compared GATA-1/ 
SCL (Cheng et al, 2009; Wilson et al., 
2009) and GATA-1 /KLFl-cooccupied 
regions and found minimal overlap 
(Tallack et al., 2010). This finding was 
surprising, given studies implicating 
GATA-1 in gene activation almost ex- 
clusively in complex with SCL/TAL1 
(see above), but it suggests that GATA-1 
may exist in two mutually exclusive 
activating complexes (Fig. 1). 

Concluding remarks 

In considering genome-wide occu- 
pancy data, one must be cognizant of 
potential methodological pitfalls. For 
example, although the widely applied 
"nearest-neighbor approach" (Kent et al., 
2002; Pepke et al, 2009) provides a 
convenient way to assign transcription 
factor— binding peaks to nearby genes, it 
may oversimplify the situation, as it 
does not take into account long-range 
cis or trans interactions that frequendy 
occur between promoter and enhancer 
elements. This limitation may account, 
in part, for the relatively small overlap 
between gene expression and transcrip- 
tion factor occupancy data. 

Nevertheless, whole genome map- 
ping of transcription factor occupancy 
is a relatively new technology that is 
providing prodigious datasets for com- 
putational and functional analyses. The 
integration of such data with profiling 
of mRNA and miRNA expression, cou- 
pled with sensitive proteomics, will lead 



to an enhanced view of the transcrip- 
tional networks orchestrating erythroid 
differentiation. The basic principles elu- 
cidated in these studies will inform tran- 
scriptional biology more broadly. 
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